The Block Distributed Memory Model

نویسندگان

  • Joseph JáJá
  • Kwan Woo Ryu
چکیده

We introduce a computation model for developing and analyzing parallel algorithms on distributed memory machines. The model allows the design of algorithms using a single address space and does not assume any particular interconnection topol-ogy. We capture performance by incorporating a cost measure for interprocessor communication induced by remote memory accesses. The cost measure includes parameters reeecting memory latency, communication bandwidth, and spatial locality. Our model allows the initial placement of the input data and pipelined prefetching. We use our model to develop parallel algorithms for various data rearrangement problems, load balancing, sorting, FFT, and matrix multiplication. We show that most of these algorithms achieve optimal or near optimal communication complexity while simultaneously guaranteeing an optimal speed-up in computational complexity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Block Matrix Factorizations for Distributed Memory Multicomputers

EEcient and scalable parallel block algorithms for the LU factor-ization with partial pivoting, the Cholesky, and QR factorizations in a distributed memory multicomputer environment are presented. The distributed system is viewed as a ring of processors and the algorithms correspond to shared memory algorithms parallelized on block level (explicit parallelism). Performance of the algorithms are...

متن کامل

Design and Performance Modeling of Parallel Block Matrix Factorizations for Distributed Memory Multicomputers

EEcient and scalable parallel block algorithms for the LU factorization with partial pivoting, the Cholesky, and QR factorizations in a distributed memory multicomputer environment are presented. The distributed system is viewed as a ring of processors and the algorithms correspond to shared memory algorithms parallelized on block level (explicit parallelism). Performance of the algorithms are ...

متن کامل

An Improvement in WRP Block Replacement Policy with Reviewing and Solving its Problems

One of the most important items for better file system performance is efficient buffering of disk blocks in main memory. Efficient buffering helps to reduce the widespeed gap between main memory and hard disks. In this buffering system, the block replacement policy is one of the most important design decisions that determines which disk block should be replaced when the buffer is full. To o...

متن کامل

An Improvement in WRP Block Replacement Policy with Reviewing and Solving its Problems

One of the most important items for better file system performance is efficient buffering of disk blocks in main memory. Efficient buffering helps to reduce the widespeed gap between main memory and hard disks. In this buffering system, the block replacement policy is one of the most important design decisions that determines which disk block should be replaced when the buffer is full. To o...

متن کامل

LUC Model : A Timestamp Ordering based View Consistency Model for Distributed Shared Memory

Excessive locking and cumulative updates in Distributed Shared Memory (DSM) not only reduces the parallelism for block access but also causes a serious degradation in response time for a dense network. This paper proposes a new consistency model in DSM named Last Update Consistency (LUC) model, where the model uses logical clock counter to keep the DSM consistent. The logical clock always incre...

متن کامل

A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers

We present a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and call it DIMMA1 (Distribution-Independent Matrix Multiplication Algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectivel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Parallel Distrib. Syst.

دوره 7  شماره 

صفحات  -

تاریخ انتشار 1996